From the GDM website: ‘The R package gdm implements Generalized Dissimilarity Modeling(Ferrier et al. 2007) to analyze and map spatial patterns of biodiversity. GDM models biological variation as a function of environment and geography using distance matrices – specifically by relating biological dissimilarity between sites to how much sites differ in their environmental conditions (environmental distance) and how isolated they are from one another (geographical distance). […] GDM also can be used to model other biological levels of organization, notably genetic (Fitzpatrick and Keller 2015) [..] and the approaches for doing so are largely identical to the species-level case with the exception of using a different biological dissimilarity metric depending on the type of response variable.’
2 Formatting data
2.1 Genomic data
2.1.1 Pairwise \(F_{ST}\) matrices
The GDM analysis takes as inputs matrices of pairwise \(F_{ST}\).
To estimate the pairwise \(F_{ST}\), we use the individual-level (i.e. allele counts for each genotype) genomic data with missing data (i.e. no imputation of the missing data), and without minor allele frequencies.
Code
# we load the allele counts of each genotypegeno <-read.csv(here("data/genomic_data/filtered_allele_counts_withoutmaf.csv"),row.names =1)geno[geno ==1] <-12geno[geno ==2] <-22geno[geno ==0] <-11geno <- geno %>%t() %>%as.data.frame() %>% dplyr::mutate(pop=substr(row.names(.), 0, 3)) %>% dplyr::select(pop, everything())geno[1:10,1:10] %>%kable_mydf(boldfirstcolumn = T)
fst_mat <- geno %>% dplyr::select(pop,all_of(cand)) %>%pairwise.WCfst(diploid=TRUE)# save itfst_mat %>%saveRDS(file=here("outputs/GDM/fst_matrix.rds"))
2.1.2 Scaling and formatting the \(F_{ST}\) matrix
The pairwise \(F_{ST}\) matrix contains 60 negative values. It generally means there is more variation within than among populations and is likely to result from uneven sample sizes.
We standardize the \(F_{ST}\) matrices between 0 and 1 as follows:
We do not have to scale the climatic data before the GDM analysis, as scaling of predictors is part of model fitting in GDM, as said in Mokany et al. (2022): ‘As different predictors are measured on different scales (e.g., temperature in degrees, precipitation in mm), they are transformed as part of model fitting, such that the transformed distance between a pair of sites for different predictors can be meaningfully compared and combined.’
We have to use a projected coordinate system to calculate the geographic distances between sites. Therefore, we reproject the geographic (spheroid) CRS of the population coordinates (with units in degrees longitude and latitude) to a projected (two-dimensional; cartesian) CRS (typically with units of meters from a datum). We will use the CRS EPSG:3035.
Code
# Steps:# extract the geographic coordinates of the populations# transform the coordinates in spatial points and specify the CRS (WGS84) with sp package# transform to a SpatVector object for the terra package# reproject with terra package in CRS EPSG:3035# extract population coordinates from the SpatVector with terra packagepop_coord_proj <-read_csv(here("data/selected_populations_GOanalyses.csv"), show_col_types =FALSE) %>% dplyr::select("longitude", "latitude") %>% sp::SpatialPoints(proj4string =CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")) %>% terra::vect() %>% terra::project("EPSG:3035") %>% terra::crds() %>%as_tibble() %>% dplyr::rename(long_EPSG_3035=x, lat_EPSG_3035=y)# merge the population coordinates with the climatic variablesclim_past <-bind_cols(clim_past,pop_coord_proj)list_clim_fut <-lapply(list_clim_fut, function(clim_fut) bind_cols(clim_fut,pop_coord_proj))
3 GDM fitting
Combining genomic and climatic data with formatsitepair
We use the formatsitepair function to combine the genomic and climatic data into population-pair table format. In our case, we use a pairwise biological dissimilarity matrix (i.e. pairwise \(F_{ST}\)) as response variable, and therefore we have to set bioFormat=3 in the formatsitepair function.
Warning! The rows and columns of the distance matrix have to be in the same order as the rows of the climatic variables. And the distance matrix and the climatic data must not include NAs.
GDM fitting
From the GDM website: ’GDM is a nonlinear extension of permutational matrix regression that uses flexible splines and generalized linear modeling (GLM) to accommodate two types of nonlinearity common in ecological datasets: (1) variation in the rate of compositional turnover (non-stationarity) along environmental gradients, and (2) the curvilinear relationship between biological distance and environmental and geographical distance.
Generalized dissimilarity models are fitted with the gdm function.
Different options are available:
geo=T to specify that the model should be fit with geographical distance.
splines an optional vector specifying the the number of I-spline basis functions (the default is three, with larger values producing more complex splines).
knots an optional vector specifying the locations of “knots” along the splines (defaults to 0 (minimum), 50 (median), and 100 (maximum) quantiles when three I-spline basis functions are used). E
From the GDM website: ‘Even though these option are available, using the default values for these parameters will work fine for most applications. In other words, unless you have a good reason, you should probably use the default settings for splines and knots. The effects (and significance) of altering the number of splines and knot locations has not been systematically explored.’
Observed genetic distance vs predicted climatic distance
Code
# Plot predicted climatic distance vs observed genetic distanceoverlay_x <-seq(from=min(gdm_mod$ecological), to=max(gdm_mod$ecological), length=length(gdm_mod$ecological) )overlay_y <-1-exp( - overlay_x )dfline <-tibble(x=overlay_x,y=overlay_y) p <-tibble(clim=gdm_mod$ecological,obs=gdm_mod$observed) %>%ggplot(aes(x=clim,y=obs)) +geom_point(color=rgb(0,0,1,0.5)) +geom_line(data=dfline,aes(x=x,y=y), color="gray60") +xlim(c(0,1.6)) +ylim(c(0,1)) +xlab("Predicted climatic distance") +ylab("Observed genetic distance") +theme_bw()# Can be saved for the Supplementary Informationp %>%ggsave(filename =here("figs/GDM/PredictedClimaticDistancevsObservedGeneticDistancePlots_SI.pdf"), device="pdf", width=5, height=5)p
Predicted versus observed genetic distance
Code
p <-tibble(obs=gdm_mod$observed, # can also be extracted with x$gdm_tab$distancepred=gdm_mod$predicted) %>%ggplot(aes(x=obs,y=pred)) +geom_abline(intercept =0, slope =1, color="gray60") +geom_point(color=rgb(0,0,1,0.5)) +xlim(c(0,1)) +ylim(c(0,1)) +xlab("Observed genetic distance") +ylab("Predicted genetic distance") +theme_bw()# We save a figure for the Supplementary Informationp %>%ggsave(filename =here("figs/GDM/PredictedvsObservedGeneticDistancePlots_SI.pdf"), device="pdf", width=5, height=5)p
I-splines
From the GDM website: ‘The fitted I-splines provide an indication of how population genetic composition changes along each climatic gradient. They are one of the most informative components of a fitted GDM and so plotting and scrutinizing the splines is a major part of interpreting GDM and the analyzed biological patterns.’
I-splines are shown for predictors with non-zero coefficients, i.e. predictors with a relationship with the genetic distance.
I-spline interpretation: (from the GDM website) ’The maximum height of each spline indicates the magnitude of total genetic change along that gradient and thereby corresponds to the relative importance of that predictor in contributing to allelic turnover while holding all other variables constant (i.e., is a partial climatic distance). The spline’s shape indicates how the rate of genetic change varies with position along that gradient. Thus, the splines provide insight into the total magnitude of genetic change as a function of each gradient and where along each gradient those changes are most pronounced.
We want to predict the expected degree of maladaptation in units of the response variable (\(F_{ST}\)) for each population.
6.1.1 Using spatial points
We create a dataset with:
the genetic distance fixed to O. Indeed, we want the disruption of the current gene-climate relationships under future climates, so we assume that the genetic composition is the same between current and future climate to do this calculation.
weights are fixed to 1.
Code
# dataframe with reference climatic datapop_tab_pred_clim_past <- clim_past %>% dplyr::select(-pop) %>% dplyr::rename(xCoord=long_EPSG_3035,yCoord=lat_EPSG_3035) %>%set_colnames(paste0("s1.",colnames(.)))# list of dataframe with future climatic data for each GCMlist_pop_tab_pred <- list_clim_fut %>%lapply(function(clim_fut){clim_fut <- clim_fut %>% dplyr::select(-pop,-gcm) %>% dplyr::rename(xCoord=long_EPSG_3035,yCoord=lat_EPSG_3035) %>%set_colnames(paste0("s2.",colnames(.)))bind_cols(pop_tab_pred_clim_past, clim_fut) %>%mutate(distance=0,weights=0) %>% dplyr::select(distance,weights,contains("Coord"), everything())})# calculate the genomic offset for each GCMgo <-lapply(list_pop_tab_pred, function(pop_tab_pred){predict(gdm_mod,pop_tab_pred)})
Code
# Function to plot the relationship between the Euclidean climatic distance and the GDM genetic offsetsource(here("scripts/functions/make_eucli_plot.R"))# Calculate the Euclidean climatic distancelist_dist_env <- list_clim_fut %>%lapply(function(clim_fut){Delta = clim_past %>% dplyr::select(-pop,-contains("EPSG")) - clim_fut %>% dplyr::select(-pop,-gcm,-contains("EPSG")) dist_env =sqrt( rowSums(Delta^2) )})# Main gene pools (for the figures)gps <-readRDS(here("data/genomic_data/main_gp_info.rds"))[[1]] %>%arrange(Population)
# Function to make the genomic offset mapssource(here("scripts/functions/make_go_map.R"))# Population coordinatespop_coord <-read_csv(here("data/selected_populations_GOanalyses.csv"), show_col_types =FALSE) %>% dplyr::select(pop ,longitude, latitude)# Find minimum and maximum values of genomic offset for the mapsgo_limits <-lapply(names(list_dist_env), function(gcm){go[[gcm]]}) %>%unlist() %>%range()# Generate the maps for each set of SNPs and each GCMgo_maps <-lapply(names(list_clim_fut), function(gcm){make_go_map(dfcoord=pop_coord,snp_set = x,gcm=gcm,ggtitle=gcm,go_limits = go_limits,point_size =3)})legend_maps <-get_legend(go_maps[[1]])go_maps <-lapply(go_maps, function(y) y +theme(legend.position ="none"))go_maps$legend_maps <- legend_mapsgo_maps <-plot_grid(plotlist=go_maps)# We can save the figures for the Supplementary Informationggsave(here("figs/GDM/GOMaps_PopLocations_SI.pdf"), go_maps, width=10,height=6, device="pdf")go_maps
6.1.2 Using rasters
Code
# Past rasterspath <-here("../../GOPredEvalPinpin/GOPredEvalPinpin/data/ClimaticData/ClimateDTRasters/1km_1901-1950_Extent-JulietteA/")tif_paths <-lapply(clim_var, function(x) paste0(path,"/",x,".tif"))past_rasts <- raster::stack(tif_paths) %>%projectRaster(crs ="EPSG:3035")list_pred_rasts <-lapply(names(list_clim_fut), function(gcm){ # for each GCM# Future rasters path <-here(paste0("../../GOPredEvalPinpin/GOPredEvalPinpin/data/ClimaticData/ClimateDTRasters/1km_",gcm,"_2041-2070_ssp370_Extent-JulietteA/"))tif_paths <-lapply(clim_var, function(x) paste0(path,"/",x,".tif"))fut_rasts <- raster::stack(tif_paths)fut_rasts <-projectRaster(fut_rasts, crs ="EPSG:3035")predict(gdm_mod, past_rasts, # raster with current climate at 2.5 minutes resolutiontime=TRUE, fut_rasts) # Rasters with future climates at 2.5 minutes resolution}) %>%setNames(names(list_clim_fut))list_pred_rasts %>%saveRDS(here("outputs/GDM/go_pred_rasters.rds"))
6.1.3 Corr btw predictions with rasters or spatial points
We check that the genomic offset predictions we obtained with the spatial points are highly correlated (also similar) to those obtained with the rasters.
Ferrier, Simon, Glenn Manion, Jane Elith, and Karen Richardson. 2007. “Using Generalized Dissimilarity Modelling to Analyse and Predict Patterns of Beta Diversity in Regional Biodiversity Assessment.”Diversity and Distributions 13 (3): 252–64.
Fitzpatrick, Matthew C, and Stephen R Keller. 2015. “Ecological Genomics Meets Community-Level Modelling of Biodiversity: Mapping the Genomic Landscape of Current and Future Environmental Adaptation.”Ecology Letters 18 (1): 1–16.
Mokany, Karel, Chris Ware, Skipton NC Woolley, Simon Ferrier, and Matthew C Fitzpatrick. 2022. “A Working Guide to Harnessing Generalized Dissimilarity Modelling for Biodiversity Analysis and Conservation Assessment.”Global Ecology and Biogeography 31 (4): 802–21.